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Abstract 

This paper describes an efficient technique for estimating, via simulation, the prob- 
ability of buffer overflows in a queueing model that arises in the analysis of ATM 
(Asynchronous Transfer Mode) communication switches. There are multiple streams 
of (autocorrelated) traffic feeding the switch that has a buffer of finite capacity. Each 
stream is designated as either being of high or low priority. When the queue length 
reaches a certain threshold, only high priority packets are admitted to the switch’s 
buffer. The problem is to estimate the loss rate of high priority packets. An asymptoti- 
cally optimal importance sampling approach is developed for this rare event simulation 
problem. In this approach, the importance sampling is done in two distinct phases. In 
the first phase, an importance sampling change of measure is used to bring the queue 
length up to the threshold at which low priority packets get rejected. In the second 
phase, a different importance sampling change of measure is used to move the queue 
length from the threshold to the buffer capacity. 
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1 Introduction 


This paper is concerned with efficient simulation techniques for estimating very low packet 
loss rates, e.g. less than 10“ 9 , in models of ATM (Asynchronous Transfer Mode) commu- 
nications switches. Such low packet loss rates are deemed a requirement in ATM networks 
and there is currently great interest in quantitative methods for analyzing the performance 
of ATM models. Standard simulation of such rare events is known to take a prohibitive 
amount of time. Therefore, research has focused on how to apply a general technique called 
importance sampling [23, 25] so as to speed up the simulation of such rare events. A survey, 
with numerous references, on the application of importance sampling to rare event simu- 
lation of models in queueing and reliability theory is given in [26], while a survey on fast 
simulation of reliability models is given in [32]. Relevant references on the use of importance 
sampling in queueing models include [1, 2, 3, 17, 18, 19, 22, 28, 29, 33, 34, 35, 37]. 

In this paper, we consider simulation of a single switch, the architecture and control of 
which has been proposed for use in ATM networks [15, 16]. The switch operates in discrete 
time. It is fed by K (K > 1) independent streams of traffic. Each stream is designated as 
being either a high priority or low priority stream. The switch can service up to c (c > 1) 
packets in each unit of time. The switch has a buffer of size B = B j + B 2 . Whenever 
the queue length (buffer contents) is less than Bi , then both high and low priority traffic 
streams are accepted by the switch. However, when the queue length is between B\ and 
B] + B 2 , low priority arrivals are rejected and only high priority arrivals are admitted to 
the buffer. When the queue length exceeds Bi + B 2 , both low and high priority packets are 
rejected. We will consider estimating the loss rate of high priority packets. 

In order to model realistic situations, it is necessary to consider the case when the arrival 
streams contain autocorrelation, as may occur in the transfer of video data. Although 
somewhat more general arrival processes can be handled, for simplicity of notation, we will 
consider the case when each arrival stream is governed by a Markov chain. If the number of 
sources K is large, and if fli + B 2 is large, then exact analysis of this model is numerically 
infeasible due to the problem of state space size explosion. While a “fluid” approximation 
to this “cell” based model can be formulated and solved (if K is not too large) [15, 16], 
the difference between performance measures of the fluid and cell models is neither well 
understood nor easily quantified. (Indeed, one application of this paper is to facilitate such 
an understanding.) 

Fast simulation of such a switch without priorities (i.e., all packets are high priority 
and B 2 = 0) was considered in [9, 10]. In those papers, the asymptotically optimal (as 
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B — ► oo ) importance sampling change of measure was described. (Here, asymptotically 
optimal means that the variance of the estimate goes to zero at the fastest possible rate.) 
This change of measure is closely related to the Gartner- Ellis theorem and the theory of large 
deviations for Markov additive processes [3, 6, 7, 13, 20, 30, 36]. The relationship between 
“effective bandwidths” (see, e.g., [8, 14, 21, 24, 27, 39]) and asymptotically optimal fast 
simulation is also explored in [9, 10], (The theory of effective bandwidths deals with the 
rate at which large queue length probabilities decay.) 

In this paper, we show how the results of [9, 10] can be extended to the switch with 
priorities. The extension is unusual in several respects. First, it requires piecing together 
two different importance sampling changes of measure: the first involving both high and low 
priority packets so as to bring the queue length up to level Z?i, the second involving only the 
high priority packets so as to increase the queue length from B\ to B\ + £ 2 - Assuming that 
the queue is stable when fed with both high and low priority traffic, both of these events 
are rare and need to be simulated differently. Second, additional care has to be taken in 
this second phase, since once the system reaches Bi, it may hover there for some period 
before ascending to B\ + B 2 (assuming it does reach the higher level). In this paper, we 
describe an asymptotically optimal (as both B\ and B 2 — ► oo) simulation algorithm for this 
problem . 

The rest of the paper is organized as follows. In Section 2, the general technique of 
importance sampling and its application to the single priority switch is reviewed. In Section 
3 we describe how to extend this approach to the switch with priorities. Experimental results 
are given in Section 4. The paper is then summarized in Section 5. 

2 Background 

To demonstrate the difficulty involved in simulating rare events, consider a simple example 
of estimating the probability 7# of an event 1Z& that becomes rarer and rarer as B — ► 00, 
i.e., lim£— 00 7 b = 0. Suppose, for simplicity, that 7 b can be written a s 

7 b = J 1 {xen B }P(x)dx = E p [l {Xe n B }\ (1) 

where the subscript p denotes sampling from the given input density p and is the 

indicator of the rare set Hb, i.e., I{xe7j e } = 1 if x e Hb and \{ x ^n B ) = 0 if x & Hb- 
The standard simulation estimate of 7 b is 7 b = (1/iV) J2n=i where X^, . . Xn are 
sampled from the density p and I n = 1 {a-„ 672 b}- Then £ p [ 7 b] = 7 and the variance of 7 B 
is 7 b (1 - 7fi)/iV. The relative error, defined to be the standard deviation divided by the 
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expected value, is proportional to I/Vt^N which, for any fixed sample size N , approaches 
oo as B — ► oo. This means that very large sample sizes are required to achieve accurate 
estimates of 7# as B — ► 00. 

Importance sampling, when applied properly, can avoid this problem. Consider the 
integral representation for 7 b in Equation 1. Multiplying and dividing the integrand by 
another density function p\x ) yields 

7B = y \{ x€ n B }^j^p'{x)dx = £y = E pl [\ {Xeng} L{X)] (2) 

where L(x) = p(z)/p'(a:) is called the likelihood ratio and the subscript p f denotes sampling 
from the density p f . Equation 2 is valid so long as p*(x) > 0 for all x £ 7Z& such that 
p(x) > 0. Importance sampling is based on Equation 2 and can be described as follows. 
Draw N samples X\ , using density p f and define Z n = L n I n where B n = L(.Y n ). 

Then, by Equation 2, £y[Z n ] = 7 b- Thus an unbiased (and strongly consistent as TV — + 00) 
estimate of 7 b is given by 

7b(p ) = jy ^ ^ n ^ n ’ (3) 

n= 1 n=l 

i.e., 7 b can be estimated by simulating a random variable with a different density and 
unbiasing the output by multiplying by the likelihood ratio. Most papers on importance 
sampling deal with how to choose a change of measure (in this case the density p\x)) so 
as to obtain variance reduction. In general, importance sampling must be applied with 
great care, since there are examples when it increases the variance, and, in some cases, it 
can even make the variance infinite. However, variance reduction is obtained by making 
the likelihood ratio small on 7 £b, which, roughly speaking, is accomplished by making A*) 
large for x £ 1Z&, i.e., by picking p f so as to make the rare event 7 Zb likely to happen. 
Suppose there are constants d\ and d<i and a function f(B ) such that /(B) — * 0 as B — ► 00 
and 

dif(B) < L(X) < d 2 f(B) (4) 

for all X £ 1Z&, and P f { 1Z&) — Ep*[l{X€ll B }] stays bounded away from zero as B — ► 00. 
Then 7 b = E p *[L n I n ] > d\f(B)P , (TZ 3 ) and E p \{L n I n ) 2 ] < (d 2 /(B)) 2 . Combining these 
two facts implies that the relative error of 7 b{p*) remains bounded as B — ► 00, i.e., 

Standard Deviation[7B(i/)l . x 

hm sup ^^<00 (5) 

B-+ 00 7 B 

and we say the importance sampling change of measure p f has “bounded relative error.” 
This implies that only a fixed sample size is required to obtain estimates of 7 b of a given 
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precision, no matter how rare 7 Zb is. Such an estimate is also asymptotically optimal, in 
the sense that its second moment approaches zero at the fastest possible rate, [f(B)] 2 . Note 
that bounded relative error is also obtained if d\f(B) < L(X) < Df(B) for all X £ TZ& 
where D is a random variable such that E p f[D 2 ] < oo. 

To estimate the steady state packet loss rate, r, we exploit a ratio formula [5, 11] that is 
an extension of the ratio formula that holds in regenerative systems [12, 38], (While, with 
multiple Markovian sources the model may theoretically regenerative, the regenerations may 
occur so infrequently as to make the regenerative method useless for practical purposes.) 
For some subset, A, of the state space define ,4-cycles to begin whenever the process enters 
A. (We will choose A to correspond to an empty buffer.) Then 

E[Y) 


r — 


E[a\ 


( 6 ) 


where Y is the number of packets lost during an ,4-cycle and a is the number of packets that 
arrive during an ,4-cycle. In Equation 6, the initial distribution is the stationary distribution 
conditioned on the process just entering A. Successive /1-cycles are identically distributed, 
but are not independent as they would be in the regenerative method. For large buffer sizes, 
the event 1Z& = {Y > 0} is a rare event; 1Z& is the event that, starting with an empty buffer, 
the buffer reaches size B before emptying. Importance sampling can be used to estimate 
the numerator, while standard simulation can be used to estimate the denominator. In 
particular, E[Y ] = E[Y\'R,b]P(1Zb). Estimation of E[Y] is difficult because P(TZq) is small. 
Therefore we will concentrate on asymptotically optimal procedures for estimating P(TZ&). 

We next review how to estimate P( 1Z.&) for the case of a switch of size B without 
priority classes. The model is a discrete time queueing model having K independent (po- 
tentially) heterogeneous Markovian sources. (Somewhat more general arrival processes can 
be handled; see [9, 10].) The switch can service up to c (c > 1) packets in each time 
slot. Let dk{t) denote the number of packets from source k that arrive during time slot 
2. Let a(t) = a\(t) + ... + a/c(2) denote the total number of arrivals during time slot 2 , 
Ak(t ) = dk(l) + ... + a*(2) denote the total number of source k arrivals in the interval 
[1,2], and ,4(2) = i4i(2) + . . . + Ax(t) denote the total number of arrivals during the inter- 
val [1,2]. If Q(t) denotes the queue length at time 2, then Q(t) is given by Lindley type 
recursion Q(t + 1) = [max(Q(2) + a(2 + 1), B) — c] + ; this recursion takes into account the 
finite buffer of capacity B . For source k , there is an environment variable X*(2) describing 
the state of the source. The distribution of arrivals has a Markovian structure given by 
Pk(a,j\i) = P(dk{t) = a,Xk (2) = j|A*(2 - 1) = i) (0 < a < 6, 1 < z,j < M ) where b and 
M are finite constants. To apply importance sampling, we draw on large deviations results 
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for Markov additive processes. Define the 0-conjugate process (the exponentially twisted, 
or tilted, distribution) with parameter 0, as follows. Let A*(0) be the spectral radius of the 
matrix 

b 

AkAiJ)=T, e ° ap ^j 10 ( 7 ) 

a— 0 

and let hk(i,9 ) be the corresponding eigenvector. (We assume that Ak,e(i,j) is irreducible 
so that, by the Perron-Frobenius theorem, \k{9) is real valued and hk(i,9) > 0.) Thus 

M 

£ A k ,<,(iJ)hk(j,9 ) = A*( 9)h k (i,9). (8) 

3 = 1 

The twisted distribution for source k is defined by 

= r,(.»(<) = o ,x t (t) = j\ x„(t - 1) = 0 = 0 ) 

which is seen to be a probability distribution by Equations 7 and 8. Each arrival process 
will be simulated (independently) using the same twisting parameter 0 for some as yet 
unspecified value of 0. Because of the form of Pkj(a,j\i), the likelihood ratio after T 
transitions has a simple form: 


L(0 y T) 




= n A 


( 10 ) 


= exp {-9A(T) + T £* i log(A*(tf))} H{9, T ) 


where H(9,T) = nf=, h k (X k (0),9)/ h k (X k (T),9). 

Suppose the queue is empty at time 0 and the event 7 Zb occurs at time Tg, i.e., the 
queue length first reaches or exceeds level B at time Tg and is non-empty in the interval 
[l,Tg]. Then there are exactly c x Tg departures in [l,Tg] and, since Q(T B ) > B, we 
must have A(T B ) > B + cTg. Let O(Tg) be the (nonnegative) “overshoot” defined by 
A(T b ) = 0(T b ) + B + cTg. Thus, by Equation 10 

L(9,T b ) = exp { -9B - 90(T B ) + 


£log(A fc (0))-0c 

,k = 1 


■I 


T b \ H (9, T b ). 


( 11 ) 


By setting 0 = 0* where 9* satisfies 


K 


JU s = c 


( 12 ) 


fc=l 
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(assuming it exists) results in the further simplification that 


L{9\T b ) = e-^ B - ro ^H{9\T B ). (13) 

Because the state space of each source is finite, both H{0*,Tb) and 0(Tb ) are bounded. 
Thus, on 1Z&, there exist constants d\ and d 2 such that 

d^e- e ' B < L(6*,T b ) < d 2 e- 9 ' B . (14) 


Furthermore, it can be shown that the queue is unstable when simulated with this value of 
0* and thus Po+(Hb) stays bounded away from zero. Thus, importance sampling with 0* 
is asymptotically optimal for estimating P(71b) and lima-^oo \og{P{TZs)) / P = -0*. Note 
that while the decay rate of P(JZs ), — 0*, is known, the constant in front of it is unknown; 
in essence, importance sampling is estimating this constant. 

In this example, the effective bandwidth of source k is known to be a£(0) = log(Afc(0))/0 
and the effective bandwidth of all the sources is a*(0) = a*(0) + ■ - • + «j^(0). Note that 
to find the optimal importance sampling change of measure requires solving Equation 12, 
a*(0*) = c, a nonlinear equation involving the spectral radiuses of the sources. 

A “splitting” technique [9, 31] can be used to efficiently estimate both E[Y] and E[a], 
The process is simulated (without importance sampling) until it is approximately in steady 
state. Then whenever the process enters A, two A-cycles are simulated using the same 
starting conditions; one using importance sampling and one without using importance sam- 
pling. These provide samples for the ratio estimate. Also, the A-cycle simulated without 
importance sampling provides a starting point (with approximately the steady state distri- 
bution on A) for the next pair of samples. In particular, let T t , a*, and Li be the samples of 
y, a and the likelihood ratio, respectively, obtained from the i-th pair of A — cycles. Then, 
the estimate of r is given by 


r„(<T) = 


ELi LjYj 


(15) 


Because the A— cycles are not independent and identically distributed, the method of batch 
means [4] can be used for variance estimation. 


3 Fast Simulation of the Shared Buffer Switch 

Let us now turn to the simulation of the shared buffer switch. Let 7 i denote the set of indices 
of high priority arrival streams and let C denote the set of indices of low priority arrival 
streams. Let a^(0) = be the effective bandwidth of the high priority streams 
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and let a* c (d) = Y^{keC} a k(&) effective bandwidth of the low priority streams. Note 

that a*(9) = a^(0) + a* c (0). 

We again consider efficient estimation of 1Z&, which is the event that, starting empty, 
the buffer reaches size B = B\ + #2 before becoming empty. This event entails first reaching 
level B\ and then reaching B. On an A— cycle, let J be the (random) number of upcrossings 
of level B\ \ on 72.#, J > 1. Let F be the upcrossing index on which the buffer first reaches 
fl, i.e., if the queue first reaches B after the j - th upcrossing of B\ but before the (j + 1 )-st 
upcrossing (assuming it exists), then F = j. (If the buffer level B is not hit, define F = oc.) 
Note that 

J oo oo 

= Y 1 {F=3,3<J) - X) l {F=j} ( 16 ) 

j- 1 i= l i=i 

since if F = j then j < J is automatically satisfied. Thus, assuming the A — cycle is started 
in the stationary distribution, 

oo oo oo 

P(K B ) = E[Y l l {F=j} ] = Yi E[\ {F=J} ] = Y P ( F = i). (17) 

J=1 j=l j=l 

i.e., J2j= i * s aT1 unbiased estimate of P(1Z&). Note that, given J > 1, the event that 

J > 1 may not be rare, i.e., given at least one upcrossing of B \ , there may be many up- 
crossings of B\. This complicates the application of importance sampling, because once the 
process crosses B \ , just pushing the process immediately towards B (using an appropriate 
change of measure) will not work well (since doing so will not produce accurate estimates 
of P(F = j) for j > 1). 

One may also look at this in another way. For each j , the optimal importance sampling 
distribution for estimating P(F = j) is different. Also there are infinitely many P(F = j)' s, 
so one cannot run a different simulation for estimating each of the P(F = j)' s. The 
technique described below provides a way out of this problem. 

We proceed as follows. We first use importance sampling to move the system up to 
level B\. When B\ is crossed we turn off importance sampling until the end of the A— cycle 
is reached. We call this the base path. This base path is used to generate samples of J, 
the number of upcrossings of B \ . Whenever the base path crosses B\ (including the first 
upcrossing), we “split” the base path. On upcrossing j ( j = 1,. . ., J) importance sampling 
is applied on the “j-th split path” to try to bring the system up to level B during this j - th 
upcrossing of B \ . The initial conditions of the j-th split path are the same as that of the 
base path just after it crosses B\ for the j - th time. The j - th split path is simulated until 
either B is hit or the queue length drops below B\, whichever comes first. This splitting of 
the base path is illustrated in Figure 1. 
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Let Lq denote the likelihood ratio during the buildup from 0 to B\ and Lj denote the 
likelihood ratio during the j-th split path. Define 

J oo oo 

z = ^2 LoLj^{n,rC,} = = E ^of'jl^nCy} (18) 

j = l j=i j=i 

where is the event that the j-th split path hits B before downcrossing B\ and Cj is 
the event that the base cycle has not hit B before the start of the j-th split path, i.e., 
Cj = {F > j } n { j < J}. With this notation, we associate the random variables F and 
J only with the base path, and not the split paths. Then Z/oLjl^nCj} is an unbiased 
estimate of P(F = j). This is because, the part of base path from the beginning of the 
A- cycle until the j-th split, combined with the j - th split path, may be viewed as a new 
base path (that also has J > j) that has been generated using the following change of 
measure: at the beginning of the A— cycle, push the the process towards B\ and then wait 
until B\ is upcrossed j — 1 times more before pushing it towards B. Denote this change of 
measure by p'. The random variable \{n } nc } } defined on the original A — cycle with split 
paths, is equivalent to the random variable l{/r = j} defined on this new base path. Then we 
get unbiasedness from the fact that 

Ep' } [LoLj 1 {F=j}] = £[l{F=j}] = P(F — j)- (19) 

Thus, Z is an unbiased estimate of P{TZb)- From Equation 18 we see that we need not 
generate split paths off the base path if the base path has already hit level B. 

We now turn to the specifics of how importance sampling is applied. On the build-up 
from 0 to B\ , the queue behaves identically to the queue analyzed in Section 2. Thus 
an asymptotically optimal estimate of P(J > 1) (hitting B\) is obtained by exponential 
twisting of both the high and low priority arrival streams with parameter 6\ defined to be 
the solution of 

a*W) = «wW) + 4W) = c, (20) 

assuming such a 9\ exists. (Assuming it exists, > 0 since the queue is stable.) In 
particular, there exist constants d\ and d<i such that 

die~ e ' B ' < L 0 < d 2 e- 6 ' 3 '. (21) 

Similarly, during the j- th upcrossing of B \ , the system behaves like a queue with only high 
priority arrivals building up from some level 0 3 to B<i (without going below 0) where Oj is 
the overshoot on the j- th upcrossing of B\ . Since the sources are finite, there exists some 


8 


constant d such that Oj < d. Thus exponential twisting of the high priority streams with 
parameter 6% defined to be the solution (assuming it exists) of 

2 ) = c (22) 

should be effective for large (Again, assuming it exists, > 0 since the queue is stable.) 
During this phase, low priority streams are simulated from their given distributions (and 
therefore do not contribute to the likelihood ratio). To make this more precise, if the time 
axis is renumbered so that the j - th upcrossing of B\ occurs at time 0 and level B\ + 5 2 is 
first hit on this up-crossing at time T#, then the number of high priority arrivals in [1,2#] 
is between i? 2 — d 4- T#c and J9 2 + Tqc + d, corresponding to the bounding cases that (i) 
Oj = d and the overshoot upon hitting B is zero, and (ii) Oj = 0 and the overshoot upon 
hitting B is d (its maximum), respectively. Thus when 6 = 9^ there exist constants d$ and 
d\ such that on 7 Zj fl Cj 

d 3 e~ e i B2 < Lj < d 4 e~^ B2 . (23) 

In addition, the queue is unstable when simulated with at 9\. Therefore, by Equation 18, 
the importance sampling estimate (from a single A — cycle) satisfies 

< Z < d 2 d 4 Je~ e ' B '- e * B2 . (24) 

The event TZ\ D C\ is simply the event that the base path hits B\ and the first split path 
hits B. Thus 

P(Kb) > dxd 3 P 6l e- 2 (K, r\C 1 )e~ d ' B ^ B \ (25) 

Note that Ci = {J > 1}, so P$* y $*(TZi HCi) = Pe*{'R\\J > \)Pq*(J > 1). Under importance 
sampling Pq*{1Z\\J > 1) and P$*(J > 1) are both bounded away from zero, since, as 
described above, they both represent the probability that an appropriate unstable queue 
reaches some queue length before returning to zero. (Note that the distribution of J depends 
only on 9\, since the base path turns importance sampling off after B\ is crossed for the 
first time.) Similarly, by Equation 24, 

Ee;,e;[Z 2 ] < [d 2 d 4 f E 9 >[J 2 ]e- 29 ' B ^ B2 . (26) 

Thus if, E$*[J 2 ] < oo, the importance sampling estimate of P(1Z&) has bounded relative 
error (and is thus asymptotically optimal). It is intuitively clear that this must be so, since 
(J — 1) is the number of upcrossings of level B\ of a queue with negative drift started just 
above level B ^ . A proof that E$*[J 2 ] < oo is given in Appendix A. To formalize this result, 
we state the following theorem. 
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Theorem 1 Let B i = fB and B 2 = (1 - f)B for some constant /, 0 < / < 1. // there 
exist constants 0^ > 0 and 9^ > 0 suc/i that a^O^) + = c and 2 ) = c > then 


lim 

B—> oo 


io g (f(7e g )) 

B 


lim 

£— ►oo 


Iog(^, g2 .[Z 2 ]) 

B 


-ov-m-f) 


(27) 

(28) 


i.e., the importance sampling estimate is asymptotically optimal and has bounded relative 
error. 


4 Experimental Results 

In this section we will present experimental results using the importance sampling algorithm 
described in the previous section. As is Section 2 define an A-cycle to begin whenever the 
process enters the set of states where the buffer is empty. Also, if a denotes the number 
of high priority arrivals in a typical .4 -cycle and Y denotes the number of high priority 
packets lost in an A-cycle, then the long run fraction of high priority packets lost is given 
by Equation 6. 

As mentioned in Section 2, asymptotically optimal changes of measure for estimating 
P(U B ) (where now B = B\+B 2 ) also work well for estimating E[Y], However the simulation 
procedure differs slightly. Similar to Equation 17, define 

J 

W = ^L oLjNjl^nc^ (29) 

j=i 

where Nj is the number of high priority packets lost in the j-th split path until it hits 
the buffer empty state. Then W is an unbiased estimate (from a single A-cycle) of E[Y]. 
Unlike the simulation for P(1Z B ) (where we stop simulating the j-th split path once it hits 
level B or downcrosses level B\ ) if the j-th split path hits B before it downcrosses level B \ , 
then we have to simulate it until the buffer empty state (in order to get a sample of Nj). 

We adopt the following simulation procedure. We first run enough A-cycles so that 
the process (approximately) reaches steady state. Then we start collecting samples of W 
and a using the splitting technique referred to at the end of Section 2, i.e., whenever the 
process enters A we generate two parallel cycles, one with importance sampling and the 
other without importance sampling. The one with importance sampling is used to generate 
samples of W and the one without importance sampling is used to generate samples of a 
as well as to get a starting point for the next pair of A-cycles. From n such sets of parallel 
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cycles, samples W\,. . W n and c*i , . . . , a n of W and a, respectively, are obtained. Then r is 
estimated by r n (0j,02) = !C"=i 1 However, since the successive VT,’s ( and a,’s) 

are not independent, the method of batch means is used to construct confidence intervals. 
We wifi now briefly review this procedure. 

Divide W\,. . .,W n and aj,...,a n into b batches of size k = n/b. (Assume that n/b 
is an integer.) Let Wj/k and 7 1 - = be., these are the 

sample means corresponding to the ith batch. The method of batch means works on the 
assumption that for k sufficiently large, the successive 6j’ s and 7/s are (approximately) 
normally distributed and independent. Hence if we estimate r by f = Yli = 1 1 7* 

(this is the same estimator as when we did not use batch means), then, if b is also large, 
y/b(r — r) « jV(0,<t 2 ) where 


Var[<5j] + 2rCov[£j,7j] + r 2 Var[ 7 j] 


Then an approximate 99% confidence interval can be constructed as (r — 2.56 \/a 2 jb y r + 
2.56 y/cr^/b). The relative error RE is thus defined to be 2.56 \/{& 2 /b)/r. In practice, since 
a 2 is unknown, it must be estimated, which can be done using the standard estimators of 
all the quantities in Equation 30. 

We now consider two examples. In the first example there are four input sources, three 
of which are high priority and one of which is low priority. Each source is a two state 
Markov modulated process, with a deterministic number of arrivals in each state; in state 1 
there is no arrival and in state 2 there is exactly 1 arrival. Let p\ k ^ denote the the transition 
matrix of the state of the fcth source, i.e., p\ ^ = P(Xk(t) = — 1) = i). For 1 < k < 3, 

p[ k J = 0.7 and p^ = 0.5; for k = 4, p[ k ^ = 0.9 and p 22 = 0.5. (This notation can be easily 
mapped to the notation of Section 2.) Note that the individual input streams are positively 
correlated, i.e., +£ 22 ^ > 1 f° r The total high priority and low priority arrival rate 

can be computed to be 1.125 and 0.167 respectively. We choose the switch speed c = 2. 
For such two state Markov processes, the appropriate Eigenvalues are known in closed form 
(see, e.g., [9, 10]). Using Equation 20 and Equation 22, the 6* and were computed to be 
0.75 and 2.00, respectively. 

We ran experiments for several values of B\ and B 2 ] we fixed / = 0.5 and varied B. 
Note that the Markov chain model for this example will have 8 (B + 1) states and it is not 
very difficult to solve this particular model numerically (for moderate values of B ). We use 
this model mainly as a means of comparing the effectiveness of our importance sampling 
algorithm to that of standard simulation; for moderate values of 5, the loss rates can be 
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accurately estimated within a reasonable amount of time using standard simulation. In 
each experiment (using importance sampling) we first simulated 300 consecutive ,4-cycles 
so that the process (approximately) reaches steady state. Then we simulated for 60,000 bi- 
cycle pairs. We used b = 2,000 batches with 30 ,4-cycle pairs in each batch. (The variance 
estimate was fairly insensitive to the batch size for the same total number of ,4-cycles). For 
the sake of completeness, we also estimated the loss probabilities (n) of the low priority 
packets using the same simulation runs; we used the ,4-cycles for the estimation of the 
denominator (in Equation 6) to get samples of the total number of low priority arrivals in 
an A-cycle and we used the base sample path of the cycle used for the estimation of the 
numerator, to get samples of the number of low priority arrivals lost during an ,4-cycle. 

Results for these experiments are presented in Table 1. Note how the RE using impor- 
tance sampling remains almost constant as B becomes larger. For comparison purposes we 
also ran standard simulations for the same CPU time that was used for the importance sam- 
pling case. By standard simulation we mean that (after discarding the first 300 A -cycles) 
at each entrance to the set A> instead of simulating a pair of ,4-cycles, we only simulate the 
,4-cycle with no importance sampling. In addition to getting samples of the total number 
of arrivals of each priority level from this ,4-cycle, we also obtain samples of the number 
lost. For a given number of samples, i.e, pair of numerator/denominator A — cycle samples, 
importance sampling requires twice the number of A — cycle simulations as compared to 
standard simulation. Also, an A-cycle with importance sampling is more costly to simu- 
late due to the split paths and the computation of the likelihood ratios. However, as we 
can see in Table 1, these inefficiencies in the importance sampling method are insignificant 
as compared to the speedup gained due to the variance reduction. Note that with standard 
simulation, in many cases, we do not even get any samples of buffer overflow in an A— cycle. 
(In the tables, such cases are indicated by a ±?. The superscript * in some of the table 
entries means that the estimate had not yet stabilized.) 

To compute speedup, we need to compute the ratio of the CPU time required by standard 
simulation to achieve a given level of accuracy to the CPU time required by importance 
sampling to achieve the same level of accuracy. Due to the large amount of CPU time it 
took for the standard simulation estimates to stabilize, we were not able to compute the 
speedup for all cases, although this was possible for B = 6 and B — 8. For B = 6, for the 
same amount of CPU time, the RE of the standard simulation estimate (for r) is about 
6.7 (=20.2/3) times larger than that of the importance sampling estimate. Since the RE 
decreases at rate l/\/sample size, the standard simulation would have to be run about 45 
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(« 6.7 2 ) times longer in order to achieve the same RE as using importance sampling. Thus 
the B = 6 speedup is about 45. For B — 8, the variance estimate using standard simulation 
had not yet stabilized. For this case, we continued the standard simulation run until we 
obtained the same RE (±3.2%) as that obtained in Table 1 using importance sampling. The 
CPU time required to achieve this same accuracy was about 450 times greater than that 
measured for importance sampling, i.e., the 5 = 8 speedup is about 450. (The standard 
simulation estimate of r was 3.13 x 10~ 5 ± 3.2% compared to 3.11 x 10“ 5 ± 3.2% using 
importance sampling.) As r decreases, the speedup using importance sampling increases, 
although the speedup becomes more difficult to compute. 

Next we consider a larger example with 16 two-state Markov modulated input sources 
(of the same type as in the previous example). The pn’ s and P22’ s of the 16 sources were 


(0.7, 0.6, 0 6, 0.5, 0.8, 0.8, 0.8, 0.8, 0.8, 0.7, 0.7, 0.7, 0.9, 0.9, 0.9, 0.9) 


and 


(0.5, 0.8, 0.6, 0.9, 0.3, 0.6, 0.6, 0.4, 0.8, 0.5, 0.9, 0.6, 0.7, 0.8, 0.6, 0.5), 

respectively. The first twelve sources are considered to be of high priority with a total 
arrival rate of 5.57 and the last four sources are considered to be of low priority with a 
total arrival rate of 0.95. The switch speed c was chosen to be 8. Using Equation 20 and 
Equations 22 the 9\ and 9% were computed as 0.34375 and 1.0625, respectively. 

We again estimated loss probabilities for several values of B and a fixed /. Note that 
the Markov chain model for this example will have approximately 2 16 (5 +1) states and 
hence it is very difficult to compute the loss probabilities numerically. Therefore, simulation 
using importance sampling is very useful. The initial transient deletion period was again 
300 A-cycles. The simulation was run for 150,000 A-cycle pairs, i.e., 5,000 batches with 30 
A-cycle pairs in each batch. Notice again how the RE using importance sampling remains 
bounded as B — ► oo whereas with standard simulation, in all cases, we did not observe 
any high priority packet losses. With importance sampling, loss rates as low as 10“ 23 were 
estimated in a reasonable amount of time with a high degree of accuracy. 


5 Conclusions 

This paper has considered the application of importance sampling to estimating the packet 
loss rate in an ATM communications switch having a shared buffer and two priority classes. 
When the buffer contents reach a threshold level, B \ , low priority packets are rejected. When 
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the buffer contents reaches its capacity, B\ + B 2 , high priority packets are rejected. The 
object of the analysis is to estimate the loss rate of high priority packets. An asymptotically 
optimal importance sampling approach was described. Experiments using this approach 
showed that many orders of magnitude speedup, as compared to standard simulation, can 
be obtained. The speedup increases as the packet loss rate decreases. 

This importance sampling approach is unusual in several respects: 

1 . The rare event of interest occurs because of the occurrence of two different events, 
each of which is rare. Importance sampling has to be done so as to make each of these 
rare events happen, one after the other. Since the best change of measure is different 
for each of these rare events, the simulation program has to shift from one importance 
sampling strategy to another at appropriate instances in time. 

2 . The situation is further complicated by the fact that the second rare event may not 
happen immediately after the first rare event. As a result, a more complicated “split- 
ting” procedure needs to be introduced so as to get an efficient overall simulation 
procedure. 

Based on the analysis, it is clear that this approach can be extended to the case when 
there are more than two priority levels, e.g., three priority levels with an admission policy 
determined by three threshold levels, B \ , B\ + B2, and B\ + B2 + B3. There would be three 
different values of 0 , 0 *, 6% and 0 with which to do importance sampling. Exponential 
twisting of the high, medium and low priority streams with parameter 0 J satisfying a'j i {0 \ ) + 
a * M { 0 \) + a c(Q\) = c i s done initially at the start of an A-cycle until B\ is reached. Then 
exponential twisting of the high and medium priority streams with parameter 0 5 satisfying 
a^( 0 2) + a M^2) ~ c * s done until B\ + B 2 is reached. Finally, exponential twisting of the 
high priority streams with parameter #3 satisfying a^( 0 3) = c is done until B\ + B2 + B3 
is reached. In addition, a multi-level splitting procedure needs to be applied; the jr-th 
split path obtained on the j-th upcrossing of B\ also needs to be split upon upcrossings of 
B\ + B2 ■ This results in a rather complicated simulation algorithm. 

Other extensions are not so clear. For example, suppose when the buffer reaches level 
B\ , the switch displaces low priority packets already in the buffer with high priority arriving 
packets. Now the change of measure required to move the buffer from level B\ to B\ + B2 
is not obvious. 

This paper thus illustrates both the potential and limitations of importance sampling. 
In certain applications, there is enough structure so that highly effective importance sam- 
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pling schemes can be devised. In such cases, importance sampling results in spectacular 
improvements over standard simulation. However, while some applications of genuine inter- 
est, such as the one considered here, can be handled optimally, the class of such applications 
is rather narrow. Furthermore, small changes in the problem structure can easily lead to 
models for which the optimal, or even a good, importance sampling change of measure is 
unknown. 


A Appendix A 


In this appendix we prove E$*[J 2 ] < oo. Our approach is based on a sequence of stochastic 
comparisons. We note that J — 1 is bounded above by the last time that the queue is 
below B\ in an ,4-cycle when started from the level B\ + d (where d is the bound for the 
overshoot when level B\ is upcrossed). Denote this last time by t\. Now we consider an 
auxiliary queue in which the lower priority arrival streams are allowed to be stored in the 
buffer even when the buffer reaches the level B \ . Let T 2 be the last time that the auxiliary 
queue is below B\ in an ,4-cycle when started from B\ + d. Since there are more arrivals 
in the auxiliary queue than the original queue, it then follows from a standard sample path 
argument that T 2 is stochastically larger than or equal to T\ y i.e., t<i > st T\. Now we remove 
both boundaries at levels B and 0 of the auxiliary queue and consider the corresponding 
random walk started from the same level B\ + d. Let 73 be the last time that the random 
walk is below level B\ . Clearly, 73 > r <2 since there are no losses in the random walk and 
there are sample paths that the random walk might reach level 0 and then bounce back to 
level B\. Now we will show that the distribution of 7*3 is bounded by an exponential tail and 
hence all the moments exist. Via the stochastic comparisons made above, all the moments 
of J - 1 exist and E$*[J 2 } < 00 . 

To show that the tail distribution of t 3 is bounded exponentially, let A(t) denote the 
total number of arrivals in the interval [l,t]. Thus, the level of the random walk at time t is 
A(t)-ct+Bi+d. From the definition for the last time, P(ts > s) = P(m&x t>s [A(t)—ct+d\ > 
0). Note that for ail e > 0 and 6 > 0 


/’(maxbAfa) - ct + d] > 0) = P(max[,4(J) — ct + d + et — et] > 0) 

t>8 t>S 

< P(max[,4(^) — (c — c)< + d] > cs) 

t>s 

< P(max[A(t) — (c — t )t + d] > €s) 

= P (max[exp(0(j4(£) — (c — e)t + d))] > exp(^£3)^ . 


( 31 ) 

(32) 

(33) 

(34) 


15 



Applying Markov’s inequality and replacing the maximum by the sum yields 


P(t 3 >s)< 

t> o 


From Equation 10, it follows that 


£[ e *M(0-(c-0‘)] _ E 6 [L(0, 

= £*[#(£, f)e (a * ( * )_(c_t))< ] 


( 35 ) 


(36) 

(37) 


Since the state space of each source is finite, H(0,t) is bounded. Thus, there exists a 
constant d 2 (0) < oo for 6 > 0 such that H(0,t ) < d 2 (0). Also, since 6\ > 0 is the solution 
of a*(0 ) = c, for any ( sufficiently small, there exists a 0 < O' < 0\ such that a*(0') < c — 2c. 
Thus, 

E ^ e 6'(A(t)-(c- «)t)] < d 2 (0') e ~ 9 ' ct . (38) 

In conjunction with (35), one has 

P(t 3 >s)< </>(0')e- ff ' (a , (39) 

where 

<}>(0') = d 2 (0')e e '^- f) (l - e _fl,f ) _1 .D (40) 


B 

/ 

Quantity 

Estimated 

Importance Sampling 
Estimate 

Standard Simulation 
Estimate 

6 

0.5 

r 

4.34 x 10~ 4 ± 3.0% 

4.72 x 10" 4 ±20.2% 



r\ 

3.72 x 10 -2 ± 4.7% 

3.92 x 10 -z ± 6.1% 

8 

0.5 

T 

3.11 x 10 -5 ± 3.2% 

4.19 x 10 -5 ± 68.4%* 



n 

1.59 x 10 -2 ± 4.7% 

1.73 x 10 -2 ± 8.9% 

20 

0.5 

r 

5.47 x 10 -i:i ± 3.8% 

0.00±? 



ci 

9.53 x 10~ 5 ± 5.1% 

12.98 x lO" 5 ± 70.8%* 

30 

0.5 

r 

1.24 x 10 _1/ ± 4.4% 

0.00±? 



Cl 

1.29 x 10 _s ± 5.0% 

0.00±? 

40 

0.5 

r 

3.02 x lO -23 ±4.8% 

0.00±? 



Cl 

1.79 x 10““ ± 5.2% 

0.00±? 


Table 1: Estimates of steady-state loss probabilities in the small example. Importance 
sampling and standard simulation were run for the same amount of CPU time. 
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B 

/ 

Quantity 

Estimated 

Importance Sampling 
Estimate 

Standard Simulation 
Estimate 

24 

1/3 

r 

1.13 x 10 -7 ± 4.4% 

0.00±? 



n 

1.15 x 10 -2 ± 2.2% 

1.17 x 10 -2 ± 3.9% 

30 

1/3 

r 

8.63 x 10 -1U ± 4.8% 

0.00±? 



n 

5.74 x 10" 3 ± 2.2% 

5.96 x 10" a ±5.1% 

45 

1/3 

r 

4.32 x 10 -1 ^ ± 7.6% 

0.00±? 



n 

1.04 x 10 -3 ± 2.2% 

1.06 x lO-^i 10.0% 

60 

1/3 

r 

2.15 x 10 _2 ° ± 4.9% 

0.00±? 



n 

1.94 x 10" 4 ± 2.2% 

2.01 x 10- 4 ± 19.8% 


Table 2: Estimates of steady-state loss probabilities in the large example. Importance 
sampling and standard simulation were run for the same amount of CPU time. 



Figure 1: The base path and the split paths for the case of J — 2 upcrossing s of B\. 
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